Search CORE

30 research outputs found

Lost in translation: Exposing hidden compiler optimization opportunities

Author: Chamski Zbigniew
Eder Kerstin
Garcia Andres Amaya
Georgiou Kyriakos
May David
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/07/2020
Field of study

Existing iterative compilation and machine-learning-based optimization techniques have been proven very successful in achieving better optimizations than the standard optimization levels of a compiler. However, they were not engineered to support the tuning of a compiler's optimizer as part of the compiler's daily development cycle. In this paper, we first establish the required properties which a technique must exhibit to enable such tuning. We then introduce an enhancement to the classic nightly routine testing of compilers which exhibits all the required properties, and thus, is capable of driving the improvement and tuning of the compiler's common optimizer. This is achieved by leveraging resource usage and compilation information collected while systematically exploiting prefixes of the transformations applied at standard optimization levels. Experimental evaluation using the LLVM v6.0.1 compiler demonstrated that the new approach was able to reveal hidden cross-architecture and architecture-dependent potential optimizations on two popular processors: the Intel i5-6300U and the Arm Cortex-A53-based Broadcom BCM2837 used in the Raspberry Pi 3B+. As a case study, we demonstrate how the insights from our approach enabled us to identify and remove a significant shortcoming of the CFG simplification pass of the LLVM v6.0.1 compiler.Comment: 31 pages, 7 figures, 2 table. arXiv admin note: text overlap with arXiv:1802.0984

arXiv.org e-Print Archive

Explore Bristol Research

Energy Transparency for Deeply Embedded Programs

Author: Chamski Zbigniew
Eder Kerstin
Georgiou Kyriakos
Kerrison Steven
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2017
Field of study

Energy transparency is a concept that makes a program's energy consumption visible, from hardware up to software, through the different system layers. Such transparency can enable energy optimizations at each layer and between layers, and help both programmers and operating systems make energy-aware decisions. In this paper, we focus on deeply embedded devices, typically used for Internet of Things (IoT) applications, and demonstrate how to enable energy transparency through existing Static Resource Analysis (SRA) techniques and a new target-agnostic profiling technique, without hardware energy measurements. Our novel mapping technique enables software energy consumption estimations at a higher level than the Instruction Set Architecture (ISA), namely the LLVM Intermediate Representation (IR) level, and therefore introduces energy transparency directly to the LLVM optimizer. We apply our energy estimation techniques to a comprehensive set of benchmarks, including single- and also multi-threaded embedded programs from two commonly used concurrency patterns, task farms and pipelines. Using SRA, our LLVM IR results demonstrate a high accuracy with a deviation in the range of 1% from the ISA SRA. Our profiling technique captures the actual energy consumption at the LLVM IR level with an average error of 3%.Comment: 33 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1510.0709

arXiv.org e-Print Archive

Explore Bristol Research

A Comprehensive and Accurate Energy Model for Arm's Cortex-M0 Processor

Author: Chamski Zbigniew
Eder Kerstin
Georgiou Kyriakos
Nikov Kris
Publication venue
Publication date: 02/04/2021
Field of study

Energy modeling can enable energy-aware software development and assist the developer in meeting an application's energy budget. Although many energy models for embedded processors exist, most do not account for processor-specific configurations, neither are they suitable for static energy consumption estimation. This paper introduces a comprehensive energy model for Arm's Cortex-M0 processor, ready to support energy-aware development of edge computing applications using either profiling- or static-analysis-based energy consumption estimation. The model accounts for the Frequency, PreFetch, and WaitState processor configurations which all have a significant impact on the execution time and energy consumption of edge computing applications. All models have a prediction error of less than 5%.Comment: 10 pages, 1 figure, 2 table

arXiv.org e-Print Archive

Explore Bristol Research

Accurate Energy Modelling on the Cortex-M0 Processor for Profiling and Static Analysis

Author: Chamski Zbigniew
Eder Kerstin
Georgiou Kyriakos
Nikov Kris
Nunez-Yanez Jose
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Explore Bristol Research

Application Domain-Driven System Design for Pervasive Video Processing

Author: Chamski Zbigniew
Cohen Albert
Duranton Marc
Eisenbeis Christine
Feautrier Paul
Genius Daniela
Publication venue: Kluwer Academic Press
Publication date: 01/01/2003
Field of study

International audiencePervasive video processing in future Ambient Intelligence environments sets new challenges in embedded system design. In particular, very high performance requirements have to be combined with the constraints of deeply embedded systems, frequently changing operating modes, and low-cost, high-volume production. By leveraging upon the key properties of the application domain, we devised a computation model, a hardware template, and a programming approach which provide a natural mapping from application requirements to a complete system solution. Our approach enables the direct exploitation of concurrency and regularity in achieving the combined challenge of adaptability, performance, and efficiency

INRIA a CCSD electronic archive server

Multi-Periodic Process Networks: Prototyping and Verifying Stream-Processing Systems

Author: Chamski Zbigniew
Cohen Albert
Duranton Marc
Feautrier Paul
Genius Daniela
Kortebi Abdesselem
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 27/08/2002
Field of study

International audienceModeling video and graphic streams with different clocks is largely an open problem. This article proposes a new kind of process network for application modeling, called Hierarchical Process Network. With properties such as abstraction, composition, synchronization and sequencing, hierarchy helps to describe stream-processing applications and deduce parameters such as throughput and buffer sizes more precisely. Real-time is explicit, as well as adaptable degrees of synchronous behavior

Crossref

INRIA a CCSD electronic archive server

Multi-Periodic Process Networks: Technical Report

Author: Chamski Zbigniew
Cohen Albert
Duranton Marc
Feautrier Paul
Genius Daniela
Kortebi Abdesselem
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

This paper aims at modeling video stream applications with structured data and multiple clocks. Multi-Periodic Process Networks (MPPN) are real-time process networks with an adaptable degree of synchronous behavior and a hierarchical structure. MPPN help to describe stream-processing applications and deduce resource requirements such as parallel functional units, throughput and buffer sizes

INRIA a CCSD electronic archive server

The SANDRA project: cooperative architecture/compiler technology for embedded real-time streaming applications

Author: Chamski Zbigniew
Cohen Albert
Duranton Marc
Eisenbeis Christine
Feautrier Paul
Genius Daniela
Pasquier Laurent
Rivierre-Vier Valérie
Thomasset François
Zhao Qin
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

The convergence of digital television, Internet access, gaming, and digital media capture and playback stresses the importance of high-quality and high-performance video and graphics processing. The SANDRA project, a collaboration between Philips Research and INRIA, develops a consistent and efficient system design approach for regular, real-time constrained stream processing. The project aims at providing a system template with its associated compiler chain and application development framework, enabling an early validation of both the functional and the non-functional requirements of the application at every system design stage

INRIA a CCSD electronic archive server

ACOTES project: Advanced compiler technologies for embedded streaming

Author: Albert Cohen
Alex Ramírez
Andrea Ornstein
Antoniu Pop
Ayal Zaks
Cupertino Miranda
Cédric Bastoul
David Ródenas
Dorit Nuzman
E. Blossom
E.A. Lee
Eduard Ayguadé
Erven Rohou
Harm Munk
Ira Rosen
J. Hoogerbrugge
Konrad Trifunović
Louis-Noël Pouchet
M. Gschwind
M. Wolfe
Marc Duranton
Marco Cornero
Menno Lindwer
Mohammed Fellahi
Paul Carpenter
Philippe Dumont
R. Allen
R.G. Scarborough
Razya Ladelsky
Roger Ferrer
S. Campanoni
Sebastian Pop
Uzi Shvadron
Xavier Martorell
Zbigniew Chamski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

HAL-CentraleSupelec

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

HAL-MINES ParisTech

The University of Manchester - Institutional Repository

HAL-Rennes 1